Stata has a very useful command that can be used for the estimation of almost any linear and nonlinear models using maximum likelihood. This command is -ml-.
Properly speaking, this command can be used to obtain any M-type estimator, however, I'll concentrate on maximum likelihood models.
In this small tutorial, I'll provide a small example of how to use -ml- in combination with -margins- to estimate marginal effects of any model of interest, as long as creates identifies the outcome of interest.
There are many ways that one can use margins to obtain "correct" marginal effects after the estimation of MLE models. Typical examples use -margins- option "expression". I find using that a bit complicated when the outcome of interest is complex, so I prefer doing it the way Stata does it. Creating a small program that defines the outcome of interest.
If one looks at the information stored in -e()- by all official Stata commands (and most community-contributed commands), you will find there is something stored in e(predict). This "something" is a program that defines the oucome(s) of interest that one can later use for other purposes. Perhaps the most common being obtaining model predictions or marginal effects.
So, in order to use -margins- with -ml-, we need the ability to modify the information stored in e(predict), and redirect -margins- to a command of our own design.
In order to so, I like to have the following program:
. program adde, eclass 1. ereturn `0' 2. end
This is perhaps the simplest program you will ever write, and yet, the most flexible. It basically "adds" any information you want to e().
For example, say that you estimate a simple OLS model, using the dataset auto, and you add some info to e(). Specifically, I'll add a local named "tag" or e(tag) that will say "This is a very simple regression".
. sysuse auto, clear (1978 Automobile Data) . reg price mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------ . adde local tag "This is a very simple regression"
You can see that this was added to e() by typing -ereturn list-. You will also see that -regress- stores information in e(predict) about a program named -regres_p-, which is used by -margins- and -predict-
. ereturn list scalars: e(N) = 74 e(df_m) = 1 e(df_r) = 72 e(F) = 20.25835256291883 e(r2) = .2195828561874974 e(rmse) = 2623.652888667586 e(mss) = 139449473.5462301 e(rss) = 495615922.5753915 e(r2_a) = .2087437291901015 e(ll) = -686.5395809065244 e(ll_0) = -695.7128688987767 e(rank) = 2 macros: e(tag) : "This is a very simple regression" e(cmdline) : "regress price mpg" e(title) : "Linear regression" e(marginsok) : "XB default" e(vce) : "ols" e(depvar) : "price" e(cmd) : "regress" e(properties) : "b V" e(predict) : "regres_p" e(model) : "ols" e(estat_cmd) : "regress_estat" matrices: e(b) : 1 x 2 e(V) : 2 x 2 functions: e(sample)
As I mentioned before, -ml- is a very useful stata command that allows you to obtain MLE estimators give that you define the appropriate objective function.
-ml- has many different options, (lf, df0, df1, df2, etc), each requiring you to define the objective function (usually the log-likelihood function), its gradient, or its Hessian. For all purposes, however, I find -lf- (the simplest one) to be the only one you may ever need.
When you use -ml- with -lf- estimator, you just need to declare the loglikelihood function contributed by each observation in the sample.
In order to ground this concept, lets assume that we want to estimate a simple linear regression using MLE. For this we need to assume that either the error follows a normal distribution, or that the outcome follows a conditionally normal distribution: $$ y_i = x_i \beta + \varepsilon_i$$ $$ y_i \sim N(x_i \beta, \sigma ) $$ or $$ \varepsilon_i \sim N(0, \sigma ) $$
This implies that the (Log)likelihood of a single observation is equal to: $$ L_i = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} (\frac{y_i-x_i \beta}{\sigma} )^2 } $$ $$ LL_i = -log(\sigma)-\frac{1}{2} log(2\pi) -\frac{1}{2} (\frac{y_i-x_i \beta}{\sigma} )^2 $$ So we just need to write a program that defines this log-likelihood function. This is stright forward:
. program myols_mle 1. args lnf xb lnsigma 2. local sigma exp(`lnsigma') 3. qui:replace `lnf' = log(normalden($ML_y1,`xb',`sigma')) 4. end
Notice that this program has 3 arguments.
Notice here that we are not estimating \(\sigma \) directly because it is numerically more stable to estimate its log.
Now the interesting part. Create our "predict" program:
. program myols_mle_p 1. syntax newvarname [if] [in] , [ mean sigma emean *] 2. if "`mean'`sigma'`emean'" =="" { 3. ml_p `0' 4. } 5. marksample touse, novarlist 6. if "`mean'" !="" { 7. tempvar xb 8. _predict double `xb' , eq(#1) 9. gen `typlist' `varlist' = `xb' if `touse' 10. label var `varlist' "E(y|X)" 11. } 12. else if "`sigma'" !="" { 13. tempvar xb 14. _predict double `xb' , eq(#2) 15. gen `typlist' `varlist' = exp(`xb') if `touse' 16. label var `varlist' "E(sigma|X)" 17. } 18. else if "`emean'"!="" { 19. tempvar xb lns 20. _predict double `xb' , eq(#1) 21. _predict double `lns' , eq(#2) 22. local sigma (exp(`lns')) 23. gen `typlist' `varlist' = exp(`xb')*exp(0.5*`sigma'^2) if `t > ouse' 24. label var `varlist' "E(exp(Y)|X)" 25. } 26. end
This program, here named -myols_mle_p-, can be used to estimate 3 things:
For the last case, -emean-, we are using the expected value for a log normal distribution: $$ log(y_i) \sim N(\mu_y,\sigma_y) $$ $$ E(y_i) = e^{\mu_y}*e^{\frac{1}{2}\sigma^2}$$
Alright Lets see how it works:
First, lets load the data:
. use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
(Excerpt from the Swiss Labor Market Survey 1998)
Then, estimate the model using -ml-. For this we specify that the "model" will be estimated using "lf" approach, and that the Log-likelihood is defined by the program -myols_mle-.
Afterwards, we declare which variables will be used to model the conditional mean, and just for fun the conditional standard errors. Both parameters will be modelled as functions of education, experience, and gender. This allows for a specific type of heteroskedasticity.
This is the model we will be estimating
$$y_i = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 female + \varepsilon_i$$ $$\varepsilon_i \sim N(0,\sigma(x)^2)$$ $$log(\sigma(x))=\gamma_0 + \gamma_1 educ + \gamma_2 exper + \gamma_3 female $$. ml model lf myols_mle (lnwage = educ exper i.female) (lnsigma:= educ exper i.fe > male), maximize initial: log likelihood = -9602.9223 alternative: log likelihood = -4263.0081 rescale: log likelihood = -3318.4554 rescale eq: log likelihood = -1777.883 Iteration 0: log likelihood = -1777.883 (not concave) Iteration 1: log likelihood = -1241.2179 Iteration 2: log likelihood = -1029.3117 Iteration 3: log likelihood = -876.30431 Iteration 4: log likelihood = -871.21438 Iteration 5: log likelihood = -871.18998 Iteration 6: log likelihood = -871.18998 . ml display Number of obs = 1,434 Wald chi2(3) = 371.53 Log likelihood = -871.18998 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ lnwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | educ | .0736371 .0045872 16.05 0.000 .0646464 .0826278 exper | .0105825 .0010551 10.03 0.000 .0085147 .0126504 1.female | -.1118167 .024106 -4.64 0.000 -.1590636 -.0645698 _cons | 2.430164 .0641158 37.90 0.000 2.304499 2.555828 -------------+---------------------------------------------------------------- lnsigma | educ | -.0356134 .0067794 -5.25 0.000 -.0489008 -.0223259 exper | -.0165439 .0016363 -10.11 0.000 -.019751 -.0133368 1.female | .2066704 .0382432 5.40 0.000 .1317152 .2816256 _cons | -.2813734 .0910696 -3.09 0.002 -.4598665 -.1028802 ------------------------------------------------------------------------------
You can compare the results with the standard -regress- outcome, or -hetregress- if you want to compare the results allowing for heteroskedastic errors
Now, if we want to use our predict command, we need to "add" it to e()
. adde local predict myols_mle_p
And that is pretty much it. We can now use margins to estimate the effect of education, experience and gender on log of wages (trivial), on the standard errors, or on wages
. margins, dydx(*) predict(mean) Average marginal effects Number of obs = 1,434 Model VCE : OIM Expression : E(y|X), predict(mean) dy/dx w.r.t. : educ exper 1.female ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0736371 .0045872 16.05 0.000 .0646464 .0826278 exper | .0105825 .0010551 10.03 0.000 .0085147 .0126504 1.female | -.1118167 .024106 -4.64 0.000 -.1590636 -.0645698 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. . margins, dydx(*) predict(sigma) Average marginal effects Number of obs = 1,434 Model VCE : OIM Expression : E(sigma|X), predict(sigma) dy/dx w.r.t. : educ exper 1.female ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | -.0161816 .0031133 -5.20 0.000 -.0222836 -.0100796 exper | -.007517 .000774 -9.71 0.000 -.0090341 -.006 1.female | .0938119 .0175522 5.34 0.000 .0594102 .1282136 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. . margins, dydx(*) predict(emean) Average marginal effects Number of obs = 1,434 Model VCE : OIM Expression : E(exp(Y)|X), predict(emean) dy/dx w.r.t. : educ exper 1.female ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 2.175891 .169221 12.86 0.000 1.844224 2.507558 exper | .236423 .0394785 5.99 0.000 .1590466 .3137995 1.female | -2.27482 .8226334 -2.77 0.006 -3.887152 -.6624884 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
So how do we interpret the results?. Here one possibility:
Each additional year of education increases wages by 7.4%, however, as education increases the dispersion of wages decreases. In terms of actual dollar change, in average, that additional year of education translates in a 2.17\$ per hour increase.